The stability to instability transition in the structure of large scale networks 
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We examine phase transitions between the "easy," "hard," and the "unsolvable" phases when 
attempting to identify structure in large complex networks ("community detection") in the pres- 
ence of disorder induced by network "noise" (spurious links that obscure structure), heat bath 
temperature T, and system size N . The partition of a graph into q optimally disjoint subgraphs 
or "communities" inherently requires Potts type variables. In earlier work [Phil. Mag. 92, 406 
(2012)] when examining power law and other networks (and general associated Potts models), we 
illustrated transitions in the computational complexity of the community detection problem typi- 
cally correspond to spin-glass-type transitions (and transitions to chaotic dynamics in mechanical 
analogs) at both high and low temperatures and/or noise. When present, transitions at low temper- 
ature or low noise correspond to entropy driven (or "order by disorder") annealing effects wherein 
stability may initially increase as temperature or noise is increased before becoming unsolvable at 
sufficiently high temperature or noise. Additional transitions between contending viable solutions 
(such as those at different natural scales) are also possible. Identifying community structure via 
a dynamical approach where "chaotic-type" transitions were earlier found. The correspondence 
between the spin-glass-type complexity transitions and transitions into chaos in dynamical analogs 
might extend to other hard computational problems. In this work, we examine large networks (with 
a power law distribution in cluster size) that have a large number of communities (^ ^ 1). We infer 
that large systems at a constant ratio of q to the number of nodes N asymptotically tend toward 
insolvability in the limit of large N for any positive T. The asymptotic behavior of temperatures 
below which structure identification might be possible, Tx = 0[l/logg], decreases slowly, so for 
practical system sizes, there remains an accessible, and generally easy, global solvable phase at low 
temperature. We further employ multivariate Tutte polynomials to show that increasing q emulates 
increasing T for a general Potts model, leading to a similar stability region at low T. Given the 
relation between Tutte and Jones polynomials, our results further suggest a link between the above 
complexity transitions and transitions associated with random knots. 

PACS numbers: 89.75.Fb, 64.60.Cn, 89.65.-s 



I. INTRODUCTION 

Applications of physics to networks [1] has opened fas- 
cinating doors for enhancing our understanding of these 
complex systems. In particular, community detection 
[2 endeavors to identify pertinent structures within such 
systems. Applications of the problem are exceptionally 
broad, and numerous methods have been proposed to at- 
tack the problem j3HTT], some of which have been com- 
pared for efficiency and accuracy p!2HT5] . 

Computational "phase transitions" have been studied 
in many challenging problems p!6H23] . Practical implica- 
tions of such studies abound (e.g., Refs. [16, 20, 24-26 ), 
and understanding the behavior of algorithmic solutions 
to these problems is of interest because the knowledge 
can be leveraged to understand when a particular solu- 
tion is computationally challenging, trustworthy, or per- 
haps not obtainable either via an inherent difficulty or 
required computational effort. Such knowledge may be 
used to in certain cases to predict the hard or unsolvable 
regimes of the problem a priori (e.g., /c-SAT [17|) or per- 
haps, more practically in general, to dynamically adapt 
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the solver during the onset of a phase transition [27] . 

Earlier work related to computational phase transi- 
tions with connections to clustering include [28", ^29^ , and 
Ref. [30 reviewed some critical phenomena in complex 
networks. The complexity of the energy landscape in 
community detection was studied for a "fixed" Potts 
model (model parameters are not set by the network un- 
der study) |3l] E2] 5 modularity |33] , and belief propaga- 
tion on block models [34j. The former and latter studies 
explicitly identified phase transitions in the respective 
systems. We extend a previous analysis [32 of a Potts 
model where we studied the thermodynamic and com- 
plexity character resulting in two distinct transitions: an 
entropic stabilization transition where added complexity 
can result in "order by disorder" annealing and a high 
temperature disordered unsolvable phase. For extreme 
complexity (high noise) at low T, the system is again un- 
solvable. Additional transitions can appear between un- 
solvable and difficult solutions or contending partitions 
of natural network scales. Here, we seek to move be- 
yond characterizing the solvable / unsolvable transition to 
study the transitions in terms of changes in the energy 
landscape and thermodynamic functions as functions of 
temperature and "noise" (intercommunity edges). 

We utilize overlap parameters in the form of informa- 
tion theory measures (see Appendix [B| and a "compu- 
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tational susceptibility" x (see Appendix [C| . Using these 
measures, we monitor increases in the number of local 
minima corresponding to (often sharply) increased com- 
putational complexity. We apply our Potts model to 
solve a random graph with an embedded ground state, 
and we identify phase transitions between "easy" and 
"hard" solvable phases which transition into unsolvahle 
regions. Specifically, the normalized mutual information 
(NMI) /tv, Shannon entropy the energy and x ex- 
hibit progressively sharper changes as the system size N 
increases suggesting the existence of genuine thermody- 
namic transitions. Similar analysis can be done for other 
community detection approaches. Many community de- 
tection methods will agree on the best solution within the 
easy phase, but the hard region presents a substantially 
more difficult challenge. 

The identified transitions may be connected to jam- 
ming [25, 26 and avalanche (cascade) transitions |35jh 
[37j in networks. Dynamic jamming transitions occur in 
traffic, computer network, particulate matter (e.g., sand 
piles), and the glassy state in amorphous solids may be 
caused by similar behavior. Refs. [38-40 showed re- 
lations between clustering and cascades in certain net- 
works, and Ref. [41 relates agent dynamics to the Ku- 
ramoto oscillators model which has been used for com- 
munity detection [42]. The threshold emergence of Gi- 
ant Connected Components (GCC) is related to epidemic 
thresholds [43l [44] , and by nature of the emerging global 
connectivity, the GCC is directly detectable via cluster- 
ing at large-scale resolutions [ie., small 7 in Eq. ([T])]. 
Jones polynomials in knot theory are related to Tutte 
polynomials for the Potts model, so our results suggest 
similar transitions in random knots (see Appendix [G|) . 

We will analytically investigate partition functions and 
free energies of a several graphs in the high temperature 
T and large number of communities q approximations. 
We illustrate that increasing T emulates increasing q for 
a general system, and the analytical results are consistent 
with the computational phase diagrams. 

The remarks of the paper is organized as follows: We 
introduce the community detection model in Sec. [TT|and 
then the embedded graph/noise test in Sec. |III[ Section 
IVj demonstrates the spin-glass- type transitions that oc- 
cur in our community detection problem via numerical 
simulation using several instability measures. In Sec. [V| 
we derive crossover thresholds for a simple case and dis- 
cuss their connections to the numerical simulations, and 
Sec. [VH demonstrates the effect of the different solution 
regions with a specific example. Section [Vn| carries out 
analytic free energy calculations on arbitrary unweighted 
graphs using a ferromagnetic Potts model. Appendix [A| 
exams the notation of "trials" and "replicas" which are of 
paramount importance in our work to directly probe the 
phase diagram sans the use of mean-field type or other 
approximation concerning complexity. Appendix [A] de- 
fines some terminology used in the paper. Appendix [B] 
and Appendix [C] describe our information and stability 
measures, and Appendix [P] elaborates on our heat bath 




FIG. 1: (Color online) The figure illustrates a partition where 
nodes are separated into distinct communities as indicated by 
distinct shapes and colors, thus identifying relevant structure 
in the graph. The current work elaborates on computational 
transitions and disorder in terms noise (extraneous intercom- 
munity edges) or thermal effects (high temperature T or large 
system size N) of solving such systems using a stochastic heat 
bath solver (see Appendix [P]) . 



community detection algorithm. We introduce the Tutte 
polynomial method for calculating the partition function 
of a Potts model for unweighted and weighted graphs in 
Appendix [E] and we show an exact calculation for a sim- 
ple connected graph in Appendix [Fj Finally, Appendix [G] 
conjectures the existence of a similar transition for knots. 



II. POTTS HAMILTONIAN 

We employ a spin-glass-type Potts model Hamiltonian 
for solving the community detection problem 



^({^}) = -\Y. [^^i - ^ (1 - -^(^^ 



(1) 



which we refer to as an "Absolute Potts Model" (APM). 
Given N nodes, Aij denotes the adjacency matrix where 
Aij = 1 if nodes i and j are connected and is otherwise. 
In general, Aij may be trivially extended to a weighted 
adjacency matrix Wij (perhaps including "adversarial" 
relations) [31], but we utilize unweighted graphs in most 
of the current work (see Sec. VI). Each spin may as- 
sume integer values in the range 1 < (Ji < q where q 
is the (dynamic) number of communities where node i 
is a member of community k when ai = k. In the cur- 
rent work, we set the resolution parameter [8 to 7 = 1 
which is near an optimal value for communities with high 
internal edge densities (see Sec. III). 

Previous work [31] elaborated on a "zero- 
temperature" (T = 0) community detection algorithm 
which we used to minimize Eq. ( [l]). A depiction of com- 
munity structure is shown in Fig. [jj where different com- 
munities are represented by different node shapes and 
colors. Here, we investigate the Hamiltonian at non-zero 
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FIG. 2: (Color online) The schematic illustrates different min- 
imizers attempting to solve a system. Colored spheres repre- 
sent distinct minimizers ("replicas") that seek a (perhaps lo- 
cal) minimum of a cost function. In an easy system, multiple 
solution attempts will generally reach a good solution (such as 
the bottom left region of the landscape) , but hard systems re- 
quire more effort to solve accurately (that is, to achieve strong 
agreement between the replicas). Unsolvable regions restrict 
accurate solutions without extreme levels optimization (such 
as exhaustive search). 



temperatures (T > 0) by applying a heat bath algorithm 
(HBA, see Appendix [D]). 

We further invoke s independent solutions ( "trials" , see 
Appendix A) by solving copies of the system which differ 
by a permutation of the order of the spin indices. This 
process leads to states that (perhaps locally) minimize 
Eq. ([T]), so we select the lowest energy trial as the best 
solution. We vary s in the range 4 < s < 20 where we 
employ 5 = 4 trials in general and use s > 4 trials for 



calculating the computational susceptibility in Eq. (CI). 

In our multi-scale ( "multiresolution" ) analysis, we 
solve r = 100 independent "replicas" (see Appendix [A|) 
and examine information theory correlations between the 
replicas and the planted ground state solutions. We 
schematically show such a set of independent solvers in 
Fig. [2] where stronger agreement among the replicas in- 
dicates a more robust solution. We compute the average 
inter-replica information correlations among the ensem- 
ble of replicas allowing us to infer a more detailed picture 
of the system beyond that of a single optimized solution. 
Specifically, information theory extrema as a function of 
T and 7 (or other scale parameters in general) correspond 
to most relevant scale (s) of the system. 



III. CONSTRUCTION OF EMBEDDED 
GRAPHS AND THE NOISE TEST 

Similar to [45 , we construct a "noise test" benchmark 
as a medium in which to study phase transitions in ran- 
dom graphs with embedded solutions [31] [32] . We define 
the system "noise" as intercommunity edges that con- 
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FIG. 3: (Color online) Panel (a) schematically illustrates in 
one dimension the easy and hard phases induced by the level 
of noise (extraneous intercommunity edges) encountered by 
a solver. Greedy algorithms are easily trapped in local en- 
ergy minima above a certain noise threshold. We previously 
showed that the model of Eq. ([T]) is robust to noise 31 even 
with a greedy algorithm. Stochastic solvers such as a heat 
bath algorithm (see Appendix [d]) or simulated annealing en- 
able one to circumvent the effects of some noise, but excessive 
levels will still thwart these solvers because meaningful parti- 
tion information is obscured by the complexity of the energy 
landscape. Panels (b) and (c) schematically depict the easy 
and hard phases in terms of the temperature for the stochastic 
heat bath solver (see Appendix [dJ. Above a graph-dependent 
threshold, the solver is less sensitive to local energy landscape 
features. 



nect a given node to communities other than its original 
or "best" community assignment. In general [3T], it is 
not possible at the beginning of an attempted solution to 
ascertain which edges contribute to noise and which con- 
stitute edges within communities of the best partition(s). 

For each benchmark graph, we divide N nodes into q 
communities with a power law distribution of community 
sizes {rii} given by where f3 = —1. We then connect 
"intracommunity" edges at a high average edge density 
Pin = 0.95. Initially, the external edge density is zero. 
Pout = 0, so that we have perfectly decoupled clusters. 
To this system, we add random intercommunity edges 
at a density of Pout < 0-5. We define pin {pout) as the 
ratio of the number of intracommunity (intercommunity) 
edges over the maximum possible intracommunity (inter- 
community) edges. 

We define the average external degree of each node Zout 
as the average number of links that a given node has 
with nodes in communities other than its own. Similarly, 
the average internal degree Zi^ is defined as the average 
number of links to nodes in the same community, and 
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FIG. 4: (Color online) The figure schematically illustrates the 
convergence time of a solver in panel (a) and the effect of ad- 
ditional optimization trials in panel (b) . Additional optimiza- 
tion trials are utilized in a "computational susceptibility" x 
in order to numerically estimate the complexity of the energy 
landscape (see Appendix [Ct- 



Zin + Zout = Z where Z is the average coordination 
number. Then we can explicitly write the internal and 
external edge densities 



NZ,^ 



and 



Pir, 



Pout 



ELi^aK - 1)' 



NZo^t 



(2) 



(3) 



where denotes the size of community a. 

The communities in this construction are well defined, 
on average, at reasonable levels of noise {pout ^0-3 de- 
pending on the typical community size n). As exter- 
nal links are progressively added to the system {pout in- 
creases), the communities become increasingly difficult 
to detect. At some stage, enough noise is added and Pout 
is sufficiently high that the planted partition cannot be 
detected despite the fact that the optimal ground state 
is still well-defined. This transition often occurs sharply, 
particularly for large networks. We investigate the phase 
transition from the solvable to unsolvable phases at both 
low and high temperatures by means of the heat bath 
algorithm described in Appendix [P] 



and others. Compared to another Potts-type qualtity 
function [6 utilizing a "null model" (a random graph 
used to evaluate the quality of a candidate partition), 
the APM exhibits a somewhat sharper transition as N is 
increased [31 . As alluded to above, two transitions are 
generally encountered as the noise value (or temperature) 
is increased. At fixed temperature T, as Pout is steadily 
increased from zero, the first onset of spin glass behavior 
first appears for values pi < Pout ^ P2- 

Figure (sja) illustrates a one dimension characteriza- 
tion of the easy and hard phases in terms of the level of 
noise (extraneous intercommunity edges) encountered by 
a greedy solver. It is in this context that greedy algo- 
rithms are, in general, more easily trapped in local en- 
ergy minima above a certain noise threshold. Stochastic 
solvers such a heat bath algorithm discussed in Appendix 
[D|or simulated annealing (SA) enable one to circumvent 
noise to some extent, but excessive levels will even thwart 
these more robust solvers because meaningful informa- 
tion is eventually obscured by the complexity of the en- 
ergy landscape. Fig. [3]^b,c) depict the easy and hard 
phases at low and high temperatures T, respectively, for 
our HB A (see Appendix [D| . Above a graph-dependent 
threshold, the solver is insensitive to local features, and 
it is unable to find an accurate solution. 

We showed that Eq. ([T]) is robust to noise [31 leading 
to exceptional accuracy even with a greedy algorithm. 
Some other methods and cost-functions [4| [46] have also 
proven to be very accurate [15 with a greedy-oriented al- 
gorithm. While maximizing modularity [47 and a closely 
related cost function in [6] have proven to be accurate and 
productive, Refs. [33l|48l[4^^ have discussed problems as- 
sociated with maximizing modularity in community de- 
tection. We briefly illustrated ^ a correspondence be- 
tween the major transition experienced by Eq. ([T]) and 
a Potts model in [6]. We conjecture the existence of a 
related transition for random knots in Appendix [Gj 

In Sec. IV A and IV B[ we elaborate on the transitions 
using a computational susceptibility x defined in Ap- 
pendix [C] In analogy with other physical susceptibility 
parameters, x measures the response of the system to 
additional optimization effort. We schematically illus- 
trate the effect in Fig. [4j A higher x indicates a more 
disordered, but navigable, energy landscape where a low 
X indicates that additional optimization has less effect 
whether due to extreme disorder or a trivially solvable 
system. Finally in Sec. |IVC] we illustrate the transitions 
using additional stability measures. 



IV. SPIN GLASS TYPE TRANSITIONS 

We previously reported [32] on the existence of two 
spin-glass-type transitions in the constructed graphs 
mentioned in Sec. IIIII Evidence for the transitions are 
observed in several measures such the accuracy of the so- 
lution obtained by means of the APM in Eq. ([T) (and 
other models [6l[31] in general), the computational effort 
required to converge to a solution [3 [31], entropy effects. 



at fixed a — q/N 



We show the phase transitions in terms of three- 
dimensional (3D) plots with the computational suscep- 
tibility x{T^Pout) for a range of system sizes N and num- 
bers of communities q. First, we fix the ratio a = q/N 
and study the phase transitions as increases. Then we 
test a range of systems with fixed q as N increases. 
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(a) = 256, q = 4, a = 0.016 (b) = 512, q = 8, a = 0.016 (c) = 1024, q = 16, (d) = 2048, q = 32, 

q; = 0.016 q; = 0.016 




(e) N = 256, q=18, a = 0.07 (f) N = 512, q = 35, a = 0.07 (g) N = 1024, q = 70, a = 0.07 (h) N = 2048, q = 140, 

a = 0.07 




FIG. 5: (Color online) Each panel shows a 3D plot x{T,Pout) as a function of temperature T and noise level pout for systems 
with the indicated number of nodes N, communities q, and a = q/N ratio. In panels (a-h) for a = 0.016 and 0.07, all plots show 
three clear phases, and the "ridges" at low and high temperatures mark the hard phase. The hard phase separates the easy 
phase (the flat region in the lower left corner with low temperature and low noise) from the unsolvable phase (the flat region in 
the upper right corner with high temperature and high noise). In panels (a)-(h), the ridges in xiT,pout) become narrower as N 
increases. The area of the easy (hard) regions decreases (increases) from panel (a) to (d) and (e) to (h), respectively. In panels 
(a-d) for a = 0.016, the hard phase at low temperature becomes less prominent from panel (a) to (d), but it becomes more 
prominent at high temperature. In panels (e-h) for a — 0.07, the hard phase at low temperature becomes more prominent 
from panel (e) to (h), but it remains constant at high temperature. In panels (i-1) for a = 0.15, only the larger systems with 
A/" > 512 show clear phases. The smaller systems with A/" = 128 in panel (i) and N — 256 in panel (j) show very noisy phases 
where only the easy phase can be readily determined, and the boundaries for the hard and unsolvable phases are diflicult to 
pinpoint. 



1. x{T,pout) at a = 0.016 

In Fig. [5] panels (a) through (d), we begin the analysis 
at a small a = q/N = 0.016 ratio. The results for four 
system sizes are shown: N = 256, N = 512, N = 1024 
and N = 2048 which maintain a fixed ratio of a across 
the respective rows. Each plot shows the easy, hard, and 



unsolvable phases. 

The two "ridges" in each plot denote the hard phases. 
The height of the first ridge at low temperature decreases 
as the system size increases while the height of the second 
ridge at high temperature increases in the same process. 
This finite size scaling behavior for the hard phase at high 
temperature indicates that the phase transition at high 
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FIG. 6: (Color online) Corresponding to Fig.jSjand Sec. IV A each plot depicts the boundaries of the hard phase for the system 
series with a fixed a = q/N ratio. Panels (a), (b), and (c) show the results for a = 0.016, a — 0.07, and a — 0.15, respectively. 
System sizes range from N — 256 to 2048, and q varies from 4 to 160 as indicated in each plot. For each a, the area within 
hard phase boundary becomes progressively narrower indicating that the transitions from the easy to unsolvable phases are 
more clear in the thermodynamic limit. 




FIG. 7: (Color online) Corresponding to Fig.jsjand Sec. IV A[ each plot depicts the first phase transition point pi as a function 
of the temperature T for systems with a fixed ratio oi a — q/N. Panels (a), (b), and (c) show the results for a — 0.016, 
a = 0.07, and a = 0.15, respectively. System sizes range from N = 256 to 2048, and q varies from 4 to 160 as indicated in each 
plot. All panels show that when a is fixed, the value of the first transition point pi decreases as the system size increases. This 
behavior further indicates that the system becomes more complex to solve in the thermodynamic limit. 



temperature exists in the thermodynamic limit. How- 
ever, the phase transition at low temperature will disap- 
pear in the same limit. In the meantime, the ridge in 
the high temperature will gradually expand into the low 
temperature region as the system size increases. Thus, 
for the systems with the small ratio of a, the phase tran- 
sition will exist in almost the entire temperature range 
in the thermodynamic limit (see Sec. [v|). 

The "easy" phase shrinks and the unsolvable phase 
expands as N increases. In detail, the approximate area 
of the easy phase on the left corner in panel (a) is in 
the range of T G (0,20) and Pout ^ (0,0.4). The area of 
the unsolvable phase on the right upper corner is in the 
range of T G (20, +oo) and Pout ^ (0, 0.4). As the system 
size increases from N = 256 in panel (a) to TV = 1024 in 
panel (c), the area of the easy phase shrinks to the range 
of T G (0, 5) and Pout ^ (0, 0.4) while the unsolvable 
phase expands to T G (5, +oo) and Pout ^ (0, 0.4). As the 
system size further increases to N = 2048 in panel (d). 



the easy phase further shrinks to the range of T G (0,4) 
and Pout ^ (0, 0.4) while the unsolvable phase expands to 
T G (4, +oo) and Pout ^ (0, 0.4). We note that the range 
of Pout for the easy phase does not decrease as the system 
size increases. 

In order to track the range of the hard phases, we 
further display a set of "boundary" plots in Fig. |6] as 
well as the first transition point pi as the function of 
temperature in Fig. [7[ For the system series with the 
fixed a = 0.016 discussed above, the 2D "hard phase" 
boundaries and the values of the first transition points 
are in panel (a) of Fig. [6] and Fig. [7| respectively. 

In Fig.jefa), the area of the hard phase shrinks, and its 
area at high temperature becomes narrower as the system 
size increases. Specifically, the width of the hard phase 
for N = 256 is about AT = 6, while it only extends to 
AT = 1 for the N = 2048. Together with the 3D phase 
diagrams in panels (a)-(d) of Fig. [sj we conclude that 
the hard phase at the high temperature becomes sharper 
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(a) r with ot = 0.016 
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FIG. 8: (Color online) Corresponding to Fig.jsjand Sec. IV A[ the convergence time r [see Fig. [4]^ a)] as a function of noise pout 
at zero temperature for the systems with a fixed ratio of a — q/N. Panels (a), (b), and (c) show the results for a = 0.016, 
a = 0.07 and a = 0.15, respectively. System sizes range from N = 256 to = 2048, and q varies from 4 to 160 as indicates in 
each plot. The noise level pout at the first peak of the convergence time corresponds to the first transition point pi in Fig.[7|at 
zero temperature. As the system size increases, the first peak in the convergence time moves to the left. They share the same 
trend as in Fig. |6] and Fig.^ 



in the thermodynamic limit. 

The boundaries of the hard phase at low temperature 
are more easily seen in Fig. [7|^a) where we plot the first 
transition point pi as the function of temperature T for a 
range of systems. The plots confirm the observations in 
Fig. [5|a)-(d) regarding the constant Pout range. That is, 
the range of Pout foi" the easy phase does not decrease as 
the system size increases [in Fig.[7|^a), pi collapses before 
T < 5 for all the systems]. This behavior hints that the 
first transition point pi at low temperature and small a 
remains constant in the thermodynamic limit. 

As depicted in Fig. [4]^a), the convergence time r pro- 
vides another view of the phase transition. We plot r 
as a function of noise level Pout in Fig. [sj^a) for systems 
with a fixed ratio of a = q/N = 0.016. The value Pout 
at the first peak of the convergence time in each system 
is consistent with the first transition point pi observed 
in Fig. [7|^a). As the system size increases, the peak con- 
vergence time shifts to the left, which corresponds to the 
lower value of pi . 



2. x{T,Pout) at a = 0.07 

For a = 0.07, the phase transitions are presented in 
Fig. [5] panels (e) through (h). The phases in panel (e) 
are noisy compared to panels (f) through (h), and all of 
the systems are more complicated than the plots with 
a = 0.016. As N increases, the phase transitions become 
more clear. However, contrary to panels (a) through (d), 
the phase transition at low temperature becomes more 
prominent as N increases, and the transition at high tem- 
perature stays roughly constant. Specifically, the height 
of the susceptibility peak at low temperature increases 
from X = 0.01 at = 256 in panel (e), x = 0.05 at 
N = 512 in panel (f), x = 0.1 for A^ = 1024 in panel (g), 
and finally reaches x = 0.2 in panel (h) with N = 2048. 
The phase transitions in this series appear to be persis- 



tent. 

The easy phase (lower left of each panel) decreases 
in area as the system size increases. This is the same 
trend that was observed in the previous a = 0.016 se- 
ries implying that the easy phase will tend to decrease 
in the thermodynamic limit up to a threshold (see Sec. 
[V| ). Specifically, the easy phase in the smallest system in 
panel (e) covers the range of T G (0, 3) and Pout ^ (0, 0.3) 
while in the large system in panel (h) covers T G (0, 1.5) 
and Pout ^ (0,0.2). The range for Pout in the easy 
phase decreases as the A^ increases which differs from 
the a = 0.016 data where the noise Pout stayed at a 
roughly constant range of Pout ^ (0,0.4). In both series 
for a = 0.016 and 0.07, the value of the initial transition 
point pi decreases in the thermodynamic limit. 

The corresponding 2D plots of the hard phase bound- 
aries and the first transition points pi are displayed in 
Fig. |6jb) and Fig. jlh) , respectively. For the series with 
a = 0.07 in Fig.|6jb), the area of the hard phase becomes 
narrower at both low and high temperatures as the sys- 
tem size increases. In detail, the width of the hard phase 
for A^ = 256 is about AT = 1.3, while the width shrinks 
to about AT = 0.3 at A' = 2048. Together with the 3D 
phase diagrams in Fig. [5|e)-(h), the phase transitions 
become sharper in the thermodynamic limit. 

As shown in Fig. [7|^b), the first transition point pi 
decreases as the system size increases, even in the low 
temperature limit. This is consistent with the first peak 
of the convergence time r at zero temperature in Fig. 
[sjb). This indicates that the system becomes progres- 
sively harder to solve in the thermodynamic limit over 
the whole temperature range. 



3. x{T,Pout) at a = 0.15 

In panels (i) through (1) of Fig. [sj a = 0.15 and the 

clusters are smaller on average resulting in systems that 



(a) = 256, q=16 (h) N = 512, q = 16 (c) N = 1024, q = 16 (d) N = 2048, q = 16 




(e) N = 256, q = 40 {^) N = 512, q = 40 (g) N = 1024, q = 40. (h) = 2048, q = 40. 




(i) N = 512, q = 70 



(j) N = 800, g = 70 



(k) N = 1024, g = 70 



(1) N = 2048, g = 70 



FIG. 9: (Color online) Similar to Fig. [s] we plot of x{T,pout) as a function of temperature T and noise level pout for systems 
with the indicated number of nodes communities q, and a = q/N ratio. Here, q is fixed for each row series, and we vary a 
(rows) to examine the behavior as N increases (columns). The heights of the susceptibility peaks at higher T increase across 
each series as N increases whereas the heights at low T are relatively constant. The N — 256 node systems do not show clear 
hard or unsolvable phases, but the transitions are strong at high temperature for most panels in the second and third columns 
of plots. 



are more difficult to solve. In panels (i) and (j), almost 
the entire region is covered by small peaks which indi- 
cates mixing of the hard and unsolvable phases thus mak- 
ing the phase boundaries hard to detect. 

The flat easy regions are recognizable in all panels, 
but the area is small relative to the previous cases and 
becomes even smaller as N increases into panel (1). In 
panel (i), the flat easy region is roughly triangular with 
legs along T G (0,1.5) and Pout ^ (0,0.2). The easy 
region shrinks to a smaller triangle along T G (0,0.2) 
and Pout ^ (0,0.2) in panel (j) and (k). In panel (1), it 
further shrinks to T G (0,1) and Pout ^ (0,0.1). The 
easy phase shrinks for both pout and T as increases 
which further indicates that the initial transition point 
Pi decreases substantially in the thermodynamic limit. 



The corresponding plots of the hard phase boundaries 
and the first transition points pi are displayed in Figs. 
|6jc) and[7|^c), respectively. From Fig. [6]^c), the area of 
the hard phase shrinks in the thermodynamic limit. The 
hard phase is more identifiable relative to the unsolvable 
region as N increases. The initial transition point pi 
drops as N increases as shown in Fig. [tJ^c). The con- 
vergence time r for the systems with the fixed ratio of 
a = q/N = 0.15 at zero temperature is shown in Fig.jsj^c) 
where the first peak of r shifts to the left as the system 
size increases. This is consistent with the trend observed 
in Fig.[7|^c). We further show in Fig. 14 and Fig. 15 that 
the first transition points in "computational susceptibil- 
ity", energy, entropy, convergence time and normalized 
mutual information are consistent with each other. 
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(a) hard phase boundary for q = 16 (b) hard phase boundary for q = 40 (c) hard phase boundary for q = 70 



FIG. 10: (Color online) Corresponding to Fig. [o] and Sec. IV B each plot depicts the boundaries of the hard phase for the 
system series with a fixed number of communities q where panels (a), (b), and (c) correspond to ^ = 16, 40, and 70, respectively. 
System sizes range from N = 256 to 2048 as indicated. For each q, the area of the hard phase becomes progressively narrower 
which indicates clearer transitions from the easy to unsolvable phases in the thermodynamic limit. 




FIG. 11: (Color online) Corresponding to Fig. |9] and Sec. IV B| each plot depicts the first phase transition point pi as a function 
of temperature T for systems with a fixed q. Panels (a), (b), and (c) show the results for q = 16, 40, and 70, respectively. 
System sizes range from N = 256 to 2048 as indicated in each plot. All panels show that the first transition point increases as 
the system size increases which is consistent with the complexity trend of the system series. 



In Fig. 16, we provide plots of scaled waiting correla- 
tion function data which clearly indicate spin glass type 
collapse. The collapse is best at the center of the com- 
putational susceptibility ridge Fig. 16 'b). The collapse 



expand as the system size increases from panel (a) to (d). 
This trend of increasing area is the reverse of the behavior 



persists up to the ends of the susceptibility ridge (e.g., 
Pout = Pi in Fig.[l6ja)) and is no longer valid outside the 
susceptibility ridge (e.g., Pout = 0-26 > P2 = 0.24 in Fig. 
lie)). 



B. x{T,Pout) at fixed q 

We fix the number of communities at q = 16, 40, or 70 
and increase the system size N from 256 to 2048. The 
plots of computational susceptibility x{T^Pout) for q = 16 
series of systems are shown in panels (a) through (d) of 
Fig.[9| As in Sec. |IV A[ the ridges indicate hard phases 
which become more prominent as N increases while the 
ridges at low temperature remain at relatively low con- 
stant values. 

The areas of the easy phases on the lower left corner 



in the fixed a systems systems in Sec. IV A This is easy 
to understand since, q increases with N here, and the 
high internal edge density Pin causes the larger clusters 
to be more strongly defined. 

We increase the number of communities to g = 40 for 
the systems in panels (e) through (h). N varies from 256 
to 2048, and a = q/N decreases as N increases so that 
the systems again become less complicated because the 
communities become more strongly defined. The hard 
and unsolvable phases in the small N = 256 system in 
panel (e) are difficult to distinguish. Only the easy phase 
can be easily identified by noting the fiat region on the 
lower left of each panel. xO^->Pout) peaks at increasing 
heights at both the low and high temperatures from pan- 
els (f) to (h) indicating that the phase transitions become 
more prominent as the system size increases. 

We further increase the number of communities io q = 
70 and study the phase transitions for the same range 
of system sizes. The hard phase at high temperature in 
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FIG. 12: (Color online) Corresponding to Fig. |9] and Sec. IV B| the convergence time r [Fig. [4]^ a)] as a function of noise pout 
for systems with fixed q. Panels (a), (b) and (cj show the results for q = 16, 40, and 70, respectively. System sizes vary from 
N = 256 to 2048 as indicated in each plot. The noise level pout at the first peak of the convergence time corresponds to the 
initial transition point pi in Fig. [TT]at zero temperature. As the system size increases, the first peak in the convergence time 
moves to the right. They share the same trend as in Figs. [To] and [11] 



panel (i) is difficult to detect. xO^iPout) clearly shows the 
three phases in panels (j) and (k). The easy phases again 
become larger as the system size increases. x{T,Pout) in 
the hard phase increases as N increases indicating that 
the phase transitions at both low and high temperatures 
are more obvious from panel (i) to (k). 



In Figs. 10 and 11, we also show corresponding 2D 
plots for the boundaries of the hard phase and the first 
transition point pi as the function of temperature T. In 
Fig. [lOj the area of the hard phase becomes narrower 
as the system size increases. At q = 40, for example, 
the width of the hard phase for the smallest system at 
N = 256 is about AT = 1.5. As N increases, the hard 
phase width shrinks to AT = latA/' = 512 and down to 
AT = 0.5 for AT = 2048 which further indicates that the 
phase transition becomes sharper in the thermodynamic 



limit. In Fig. 11, the first transition point pi increases 
over the entire temperature range as N increases. This 
behavior is consistent with the system complexity trend 
as previously mentioned. 

In Fig. [T2j we further plot the convergence time r as 
the function of noise Pout for a fixed number of commu- 
nities q at zero temperature. Pout for the first peak of the 
convergence time matches the first transition point pi in 



Fig. 11 As the system size increases, the peak moves to 



the right. This is also consistent with Fig. 11 where the 
system becomes less complicated as N increases. 



Fig. [5] or [9] for comparison. All panels are for a system 
of size N = 2048. In panels (a) through (d), q = 16 
which corresponds to Fig. [sjd). Panels (e) through (h) 
plot results for q = 32 with a = 0.015 which corresponds 
to Fig. [9|d). Panels (i) through (1), display the results 
for q = 70 which corresponds to Fig. [sjl) . Finally, Panels 
(m) through (p) display results for q = 140 and a = 0.07 
corresponding to Fig. [9]^h). 

All panels consistently display the three different com- 
plexity phases: the "easy" (flat region, lower left), "hard" 
(varied central regions), and "unsolvable" phases (far 
right or top). The existence of the hard phase is re- 
flected by the ridges at both low and high temperatures in 
the susceptibility x plot which often corresponds rapids 
shifts (up or down) in the other measures. In each plot, 
the red line serves as a guide to the eye to emphasize the 
boundaries between different phases. The boundaries are 
consistent with each other across the respective rows. 

In Ref. [32 , we also demonstrated the spin glass char- 
acter of the phase transition by observing the exceptional 
collapse of time autocorrelation curves (over four orders 
of magnitude of time at high and low temperatures) in 
the vicinity of the hard phase. We further elucidated 
on evidence regarding phase transitions [32 in identi- 
fying community structure via a dynamical approach 
(some other dynamical methods include [l0l[42]) where 
"chaotic-type" transitions that we speculated upon may 
extend into the node dynamics for large systems. 



Other information theoretic and 
thermodynamic quantities 



V. NON-INTERACTING CLIQUES 



We further fortify and provide our results of the phase 
diagram of our systems as ascertained via other infor- 
mation theoretic and thermodynamics quantities. These 
measures include the average normalized mutual infor- 
mation In between replica pairs. Shannon entropy 
and energy E as shown in Fig. [Tsj We additionally show 
the corresponding computational susceptibility x from 



As depicted in Fig. [IT) we analytically estimate a mini- 
mum transition temperature by examining a system with 
q non-interacting cliques. In panel (a), each of the q com- 
munities consists of / nodes which are maximally con- 
nected, but no noise exists between these cliques. The 
presence of noise will, in general, lower the temperature 
Tx of the transition point which manifests as departure 
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(a) x(T,Pout) for q = 16 (b) lN(T,Pout) for q = 16 (c) H(T,pout) for g = 16 (d) E(T,pout) for g = 16 




(e) x(r,:Pont) g = 32 (f) In (T, Pout) for g = 32 (g) H{T,pout) for g = 32 (h) E(T,pout) for g = 32 




(i) x(r,Po^t) for q = 70 (j) lN{T,pout) for g = 70 (k) H(T,pout) for g = 70 (1) E{T,pout) for g = 70 




(m) x(T,pout) for g = 140 (n) lN(T,pout) for g = 140 (o) H(T,pout) for g = 140 (p) E(T,pout) for g = 140 



FIG. 13: (Color online) Plots of the computational susceptibility x (column one), NMI In (column two), Shannon entropy H 
(column three), and energy E (column four) as functions of temperature T and intercommunity noise Pout- System sizes all 
use N = 2048, and q varies from 16 to 140 in different rows. All plots show the easy, hard, and unsolvable phases often by 
rapid shifts in the respective measures. The red lines serve as a guide to the eye for emphasizing the manifestation of the hard 
phases in each measured quantity where we note that the boundaries match well across each row. 



from the easy phase in certain regions of Figs. |5] and [9j 



Within our algorithm and model, communities do not 
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Pout 



FIG. 14: (Color online) The plots of susceptibility x, con- 
vergence time T, energy accuracy In and the Shannon 
entropy H in terms of noise pout for the system N — 2048 
and ^ = 140 at a zero temperature. All the plots show three 
phases as noise varies: (1) Below pi = 0.2, the system can 
be solved in this "easy" region ( e.g., the accuracy is /at = 1 
); (2) When 0.2 < pout < 0.24, where the benefit of extra 
trials is the largest, it's "hard" to solve the system without 
misplacing nodes (e.g., x? E and H achieve the peak ) ; (3) 
Above p2 = 0.24, the system is "impossible" to be perfectly 
solved. [pi,p2] are generous bounds in transition crossover re- 
gions. Note that the two transitions are demonstrated to be 
of spin-glass-type by observing the scaling of the correlation 
function between [pi,P2] in Fig. 16 



interact in an explicit sense. In addition, with this model 
problem the situation is greatly simplified because no 
edges are assigned between cliques, so we use Eq. ^ to 
calculate the partition function of the system by count- 
ing the energy contribution of all edges within each clus- 
ter over the number of combinations for partitioning the 
clusters. As a further simplification, we also set the en- 
ergy contribution for a single edge to be —2 so that the 
Hamiltonian gives an energy of —1 for each edge. 



A. Partition function 

First, we investigate the smallest non-trivial clique size 
with / = 3 nodes. The partition function for the decou- 
pled cliques is. 



z = iz,r = J2 



(4) 



where Zi is the partition function for a single clique and 
P = 1/T is the inverse temperature. Considering the 
/ = 3 cluster combinations depicted in Fig, p^b), Z3 is 

Z3 = qe""^ ^3q{q-l)e^^ ^3q{q-l){q-2) 

^q{q-l){q-2){q-3). (5) 

The first term represents the optimal local cluster solu- 
tion, and the sum of the remaining terms accounts for 
the remaining sub-optimal local partitions. We define 
uji as the ratio of Boltzmann weights of the sub-optimal 



partitions to the optimal solution. For Z3, the ratio ous 
is 

g(g-l) [3e^/^ + 3(g-2) + (g-2)(g-3)] 
-3 = . (6) 

uji < 1 indicates that the optimal solution is dominant, 
while uji ^ 00 means the system is disordered. We can 
define a;^ = 1 as the transition point from the ordered 
phase to the disordered phase, and the corresponding 
"crossover" temperature Tx is found by solving the tran- 
scendental equation 



3{q - l)e-^/^x + 3{q - l){q - 2)e-^/^x 
+ (g-l)(g-2)(g-3)e-^/^x =1. 



(7) 



In the limit of large q^ this equation simplifies to 

which yields our estimate for the crossover temperature 

2 



logq 



(9) 



for the / = 3 clique system. 

If we generalize to arbitrary clique size /, the corre- 
sponding partition function for a single clique becomes 

+ ^-^q{q-l){q-2)e'^('-^') 

+ ■■■ + qiq-l)iq-2)---iq-l). (10) 



Again, the first term in Eq. ( 10 ) is the Boltzmann weight 



of the optimal clique partition, and the other terms sum 
the weights of the incorrect partitions, ui is 



qe 



(11) 



and uji = 1 returns the cross-over temperature Tx for 
arbitrary cliques of size We summarize the crossover 
temperature relations in column one of Table [T| where we 
express e^/-^x terms of powers of q for several values 
of /. The general relation is 



l-l 
logq' 



(12) 



B. Symmetry Breaking 

We can inquire about the crossover temperature Tx 
from another perspective. Take two nodes i and j in the 
same clique. If the probability that a solution assigns 
them to the same community is high, then the system 
is in the "ordered" state. If this probability is 1/q^ the 
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FIG. 15: (Color online) (a), The plots of susceptibility convergence time r, energy E, accuracy In, Shannon entropy H in 
terms of noise pout for the system N = 1024 and ^ = 70 at a zero temperature, (b), The normalized mutual information In in 
terms of noise pout for a series of systems with the size oi N — 1024 but different number of communities q. From both plots, 
we are able to detect the first and second transition point pi and p2. pi is the point where the In drops from 1, x increases 
from 0, r achieves the peak, E and H increases from some constant value. p2 is the position where the In curves with different 
number of communities collapse shown in (b). p2 also corresponds to the peak of energy and entropy as shown in (a). 




U(t„,t) uftJJ U(t„J) 

(a) Pout = 0.2 is within the zero (b) pout = 0.22 is within the zero (c) pout = 0.26 is within the zero 

temperature "hard" phase, where the temperature "hard" phase, where the temperature "unsolvable" phase, 

collapse is perfect. collapse is perfect. where the collapse is poor. 



FIG. 16: (Color online). We show a collapse of the correlation curves for different waiting times tw for a system with N = 2048 
nodes, q = 140 communities, pout varies from 0.2 in panel (a) to 0.26 in panel (c). The first and second transition points for 
this system are pi = 0.2 and p2 = 0.24. The temperature is T = 0. The vertical axis is g{t)C{tu,^ t) where g{t) = 8 — log]^Q(t), 
C(tw,t) — ^ ^i^i ^C7i{t^),ai{tw+t) is the correlation function. The horizontal axis is u(tw,t) = + ^to)"^"^ — ^iT^] where 

fi — 0.1. The noise pout — 0.2 (a) and pout = 0.22 (b) lie within the "hard" region where the collapse of correlation function 
is perfect. The noise pout — 0.26 (c) is above the second transition point p2 in the "unsolvable" region, where the collapse 
becomes poor. That the collapse of the correlation function starts to degrade right after the second transition point p2 at zero 
temperature indicates that this transition is of the spin-glass type. 



system is in its "disordered" phase. We can define a 
crossover temperature T^^^^ at which the probability of 
node i and j being in the same cluster exceeds 1/q and 
thus symmetry between Potts spins is broken. This prob- 
ability P{cFi = CFj) = {Sa,,aj) IS 



where and dj denote the cluster memberships for nodes 
i and j, respectively. Expressing the numerator and in 
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FIG. 17: (Color online) Panel (a) depicts q independent 
cliques (maximally connected clusters). Panel (b) indicates 
the different combinations of / = 3 nodes which must be 
summed (including three copies of the 2-1 configuration) in 
order to determine the partition function for a single clique. 
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TABLE I: In column one, the crossover temperature Tx from 
an "ordered" to a "disordered" state is determined by defin- 
ing the ratio cj^ = 1 of the sum of Boltzmann weights of 
sub-optimal node assignments to the weight of the optimal 
assignment into clique communities as a function of the clus- 
ter size / and the number of communities q in the large q 
limit. In column two, we estimate T^^^*^^ Tx through dif- 
ferent means by calculating the probability p = 1/q that two 
nodes (in the same clique ideally) are determined to be in the 
same cluster. In the last column, we generalize column two 
for an arbitrary probability p. 



terms of / and Eq. ( |13| ) becomes, 

+ ■■■ + q{q-l)---{q-l-2)} 

/^^qe^K2)+lq(q-l)e^K'-.') 

+ ■■■+q{q-l){q-2)---{q-l)y4) 

In the limit of large q, Eq. (p!4|) simplifies to 



Choosing P{(Ji^(jj) = l/q yields in a crossover tempera- 
ture T^^^^ at which the system goes from being unbro- 



ken q-state symmetry to ordered. When / = 3, Eq. (15) 
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FIG. 18: (Color online) The crossover temperature at which 
the system cannot be perfectly solved as the function of the 
system size N . The data here uses cliques of size / = 3. 
The dashed line is the analytical result and the solid line is 
determined by the heat bath community detection algorithm 
optimizing the Hamiltonian of Eq. ([T]) . 



becomes, 



qe 



,6/3 



3q^e^^ + iq^ 



(16) 



In the large q limit, e^^ and the crossover tem- 



perature is T. 



(i/g) _ 



3/logg. The asymptotic expressions 



for several values of q and / are summarized in column 
two of Table [ij For general q and the relation is 



n(i/<?) 



log (7 



(17) 



For a general crossover probability P{(Ji^(jj) = p with 

/ = 3, the crossover temperature T><^ is determined by 
solving 



p6/3 



qe 



2/3 



: pe 



6/3 , 



.2/3 



Sq^p^q^p. (18) 

- where T^^^ 

2/ (log q + 1/3 log p). Results for T^^ for several values of 
q and / are shown column three of Table [ij For general q 
and /, the relation is 



3qpe' 

In the large q limit, Eq. (|18) is e^^ 



(19) 



C. Simulated crossover temperature 

We can also simulate the crossover temperature Tx or 

(v) 

as a function of system size N by solving the non- 
interacting clique problem using our heat bath commu- 
nity detection algorithm (see Appendix O). As seen in 
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Fig. [Tsj the simulated and analytic asymptotic behaviors 
agree well in the large N limit, so the crossover temper- 
ature for this trivial system is Tx =0. 

The crossover temperature derived in this section deals 
with a heat-induced disorder. That is, it marks the on- 
set of a "liquid" phase that transitions at a lower heat 
bath temperature as the system size grows. In practice, 
one uses a SA algorithm that applies a cooling scheme 
(as opposed to a constant temperature HBA) to improve 
the attempt at locating the ground state of the system. 
That is, it applies a high temperature exploration of the 
general landscape finished by low temperature "fine tun- 
ing" of the solution. For the non-interacting cliques in 
this section, SA would obviously still identify the ground 
state because the energetic fiuctuations would trivially 
diminish as the system is cooled toward T = 0. 

With increasing Pout at low T, disorder imposed by 
the glass-type transition is induced by the complexity of 
the energy landscape, but the transition is qualitatively 
comparable in the sense of the induced disorder in the 
solutions found by the HBA. The glass phase also expe- 
riences a transition to a liquid-like disordered state at a 
temperate that increases slowly with the level of noise, 
but here, a SA solver will not necessarily transition read- 
ily to the ideal solution as the system is cooled because 
of the inherent complexity of the energy landscape. The 
greedy algorithm used in |31j (equivalent to the HBA at 
T = 0) applied to the Potts model of Eq. ([T]) is already 
very accurate [HI [151 El] ? so we expect that the greatest 
benefit of SA over a greedy-oriented solver using Eq. 
will manifest in the hard region near the onset of the 
"glassy" transition. 



D. A discussion of the crossover temperature 

For a spin system with fixed size TV, a larger number 
of spin states q corresponds to a more disordered system. 
If we expand the partition function of the Potts model 
in terms of 1/q, it is explicitly represented as a sum over 
configurations with progressively larger clusters of iden- 
tical spins [50 . That is, two spins with the same index 
ai = (jj are connected. Then three spins = (jj = 
are connected, etc. The resulting terms illustrate that 
increasing q emulates increasing temperature T. 

Our analysis in this section applies to general graphs 
with ferromagnetic interactions (equivalent to the "la- 
bel propagation" community detection algorithm [51]) on 
regular, fixed-coordinate lattices [52H54j . Increasing the 
number of system states q causes the system to be in- 
creasingly disordered. Thus, in the community detection 
problem, increasing number of communities q linearly 
with the system size N (such that the average community 
size remains constant), the solvable (easy) phase shrinks 
to a "small" region as ^ oo. 



Figures 13 ^m-p) illustrate the distinction in the differ- 
ent regions or types of disorder: entropic (high complex- 
ity) and energetic (high T). Interestingly, in some cases. 




(a) Original 



(b) Easy 




(c) Hard 



(d) Unsolvable 



FIG. 19: (Reproduced from Ref. 57 ) We show an image 
where we apply our community detection algorithm to detect 
the relevant structures. This case seeks to identify a bird and 
tree against a sky background. The original images is in panel 
(a), and the segmentation results are shown in panels (b-d) 
corresponding to the easy, hard, and unsolvable regions of the 
community detection problem, respectively. Figure [2Q1 shows 
the phase diagram identifying these respective regions. 




FIG. 20: (Reproduced from Ref. [57!) We show a three- 
dimensional phase diagram of NMI (/at) versus lo g (7) and 
log(T) for the image segmentation of the bird in Fig.ITo] T is 
heat bath temperature for a stochastic community detection 
solver (see Appendix[D]), and 7 is the model weight in Eq. ([T]). 
We note that the optimal values in the easy and hard regions 
correspond to the "physical" segmentations of the bird and 
tree against the background, but the bird is undetectable in 
the unsolvable region. 
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additional noise emulates a higher temperature solution 
process in the sense that it provides additional avenues 
to explore different configurations. Such an effect may 
occur in Fig.[T3]^a-d) where the accuracy [In in panel (b) 
increases for a short time with increasing noise Pout]- 
Fig. 13 'n) further shows a crossover region 0.24 < 



Pout ^0-32 where mid-range temperatures improve the 
solution accuracy (higher In). Although this data uses 
a constant temperature heat bath (no cooling schedule), 
this is the effect of a stochastic solver (see Appendix O), 
allowing it to navigate the difficult energy landscape more 
accurately than a greedy solver. On the left (lower T), 
the more greedy nature of the solver prevents an accu- 
rate solution in the presence of high noise. On the right, 
the higher temperature of the heat bath itself hinders 
an accurate solution. In effect, the HBA "wanders" at 
energies above the meaningful, but locally complex, fea- 
tures of the energy landscape resulting in more random 
solutions. 

The results here incorporate a "global" model param- 
eter 7 in Eq. ([T]). That is, the model asserts globally 
optimal 7(s) for the entire graph. For large graphs, this 
condition is less likely to be true across the full scope 
of the network, but one can explore methods to obtain 
locally optimal 7^ (in time or space) for each region or 
cluster £ . Utilizing locally optimal j£S will likely work 
to circumvent the temperature transition at low levels of 
noise. The successful selection of a local 7^ in the glassy 
(high noise) region is more difficult because of the com- 
plex nature of the local energy landscape. 

In the following section, we study the free energy of 
several systems for ferromagnetic Potts models and then 
generalize to arbitrary weighted Potts models, including 
antiferromagnetic interactions, on arbitrary graphs [56] , 



VI. AN EXAMPLE OF A PHASE TRANSITION 
IN AN IMAGE SEGMENTATION PROBLEM 

We illustrate the phase transition effect with an realis- 

we ap- 



tic image segmentation example [57]. In Fig. 19 



ply our community detection algorithm to detect a bird 
and tree against a sky background. We display the re- 
sults in Fig. 20 where we plot NMI {In) versus log(7) in 
Eq. ([1]) and log(T) where T is the temperature for our 
stochastic community detection solver (see Appendix [D|). 
For this problem, we apply edge weights by replacing the 
Aij elements in Eq. ([T]) with "attractive" and "repulsive" 
weights Wij which are defined by regional intensity dif- 
ferences within the image [57 . 

We label the easy (b), hard (c), and unsolvable (d) re- 
gions in the phase plot for the bird image in panel (a). 
Panel (b) shows that our algorithm clearly detects the 
bird and tree against the background, meaning that the 
NMI information measure identifies the physically rel- 
evant clusters in the problem. In panel (c), the back- 
ground is segmented separately, but the bird and tree 
are composed of many small clusters. Panel (d) shows 



that the bird is undetectable in the unsolvable region. 



VII. FREE ENERGY: SIMPLE RESULTS 

In the following analysis, we explicitly show the large 
q and large T expansions for the free energy per site in 
three example systems (a non-interacting clique system, 
simple interacting clique system, and a random graph) 
before generalizing the analysis to arbitrary unweighted 
and weighted graphs. Previous works examined disorder 
transitions for random-bond Potts models [58] [59] and 
Ref. [60 studied zeros of the partition function in the 
large q limit. Large q behavior was shown to approach 
mean- field theoretical results on fixed lattices ^TJ [62] . 
For the unweighted systems, we use a binary distribution 
for the interaction strength J = 1 or (i.e., the energy 
contribution of an edge is either "on" or "off"). 



A. Free energy of a non-interacting clique system 
under a large q expansion 

If we generalize the non-interacting clique system in 



Fig. 17 to cliques of size /, the partition function is 



+ ^.(.-l)(.-2)eM'-) 

+ ■■■ + q{q-l){q-2)---{q-l) \ (20) 
When q ^ oo, 



+ 



J+i 



(21) 



The free energy per site, / = — ^^logZ, (with the 
Boltzmann constant set to /c^ = 1) is 



1-2 



f ^ -Tlogq -TJ2 



k=0 



.^j{l-\) 



-(fc+i) 



(22) 

From Eq. ( [22] ) , we further simply the free energy per site 
/ « -Tlogg-T^a(fc)eMI-l)g-(fc+i) 



k=0 



f ^ -Tlogq -Ta{0)- 



(23) 



where a{k) = {^~^^)-^^' We will compare Eq. (23) with 
the high T expansion in the next section. Despite the 
functional dependence of exp(/3J), the large q limit dom- 
inates the expansion, forcing the system to be approxi- 
mately equivalent to a large temperature limit. 
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B. Free energy of a non-interacting clique system 
as ascertained from a high temperature expansion 

Note that the most ordered Potts graph is a sys- 
tem of non-interacting chques (maximahy connected sub- 
graphs). That is, the presence of noise (extraneous inter- 
community edges) win only serve to increase the overall 
disorder in the system. One exception is that increased 
disorder can emulate increased temperature T for both 
greedy a nd st ochastic community detection solvers (see 
also Sec.lVD). 



We can construct the high T expansion easily by means 
of Tutte polynomials [63] (see Appendix E 1 ) where we 
again solve a system of q cliques of size Equation 
([T]) and a ferromagnetic Potts model have the same 
ground state ener gy for this clique system (see also Sees. 
|VII E[ [VII G and VII H for more general derivations), so 
the partition function in terms of the Tutte polynomial 
t{G; X, y) for a graph G is 



2'=(«)«l^l-'=(«)t(G;x,y) 



(24) 



where q is the number of clusters or states, v = exp(/3J) — 
1, G denotes the graph, k{G) is the number of connected 
components in G, \V\ = N is the number of vertices, 
X = {q -\- v)/v and y = v -\- 1. For the non-interacting 
clique system, k{G) = q and N = Iq. We denote the 
Tutte polynomial of a single clique of size / as Ki{G; x, y). 
K2{G;x,y) = x, so the partition function is 



(25) 



where we used N = 2q. In a high T approximation, 
x^g/i;^!, so the partition function becomes Z 
and the free energy is 



n2q 



f^-Tlogq, 



(26) 



which simply states that the system is completely random 
in the large T limit. 

For triangle cliques, Ks{G; x^y) = x'^ -\- x -\- y. The 
graph G is composed of disjoint triangles, so the Tutte 
polynomial is t{G; x, y) = (x^ -\- x -\- y)^ ^ and the parti- 
tion function becomes 



{x^ 



-y 



(27) 



In a high T approximation ^ 1, but x ^ q/v ^ 1 
in either the large q or large T limits, so we make a 
further approximation of y ^ 0. Then, Ks{G;x,y = 
0) = x^{x -\- 1)^ ^ x'^^. The partition function simplifies 
to Z ^ q^^^ so the free energy per site for / = 3 is again 



f^-T\ogq 



(28) 



which is identical to the / = 2 result because we con- 
sistently applied the approximation q/v ^ 1 to x = 
{q/v + 1) ~ q/v and (x + 1) = {q/v + 2) ^ q/v. 




FIG. 21: (Color online) A depiction of a circle of cliques (max- 
imally connected clusters) of size / connected by single edges. 
In contrast to Fig. |17| this system adds a simple interaction 
between cliques. We analyze the configuration in Sec. |VII C| 
and show that a ferromagnetic Potts model behaves the same 
in the large q and large T limits. 



Generalizing to an arbitrary clique size / in the high T 
approximation, the Tutte polynomial Ki{G; x^y = 0) is 



Ki{G;x,y = 0) = 



The partition function is 



r(a; + ;- 1) 



(29) 



Jq 



(i-l)g 



r(g + ^-i) 
r(f) 



(30) 



and V = e^'^ — 1 « /3J, so the free energy per site yields 



f^-Tlogq- 



l-l 



riog 



/3J 

q 



Iq 



log 



(31) 

The leading \ogq term represents the infinite T limit 
which is approximately constant in large systems for any 
clique size /. That is, the partition function Z^^oo ^ 
for every system. The / = 2 and 3 results above illus- 
trate that when / <C q^ the ratio of gamma functions in 
Eq. (31 ) simplifies to x^^^ and the free energy for the non- 
interacting clique system is approximately / ^ logq in 
the large T limit. 



The second term in Eq. (31) gives the leading order 
correction for high T. It is absent in the explicit / = 2 
and 3 results above because we applied the approxima- 
tion q/v 1. Together, the last two terms imply that 
increasing the temperature T (decreasing /3) emulates in- 
creasing the number of communities q for a ferromagnetic 
Potts model. 
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C. Free energy for the "circle of cliques" in the 
high q or the high T expansion 

We now investigate the slightly more complicated sys- 

"circle of cliques" where each 



tem depicted in Fig. 21 



complete sub-graph cluster is connected to its neighbors 
by a single edge. We construct q cliques of size / = 3 and 
apply the Tutte polynomial method [63 to solve the sys- 
tem. As in the previous sub-section, the ground state of 
Eq. ([T]) and a ferromagnetic Potts model have the same 
energy, so we use a ferromagnetic model. In terms of the 
Tutte polynomial x, y) for a graph G, the partition 
function is given by Eq. ( |24| ). 

Equation (F4) in Appendix |F] derives the exact Tutte 



polynomial for Fig. 



21 



with / 



and Eq. (F6) gives 



the high T expansion t(G; x, = 0) = (1 + x^^^ x^^ . 
Substituting N = 3q and the approximation x ^ q/v (in 
either the large q or large T limits), the partition function 
becomes 



2g-2^(?+2 



(1 



-xY^^x^^- 



(32) 



3q 



We factor out q 
V = exp(/3J) — 1 



and then apply the approximations: 
^ /3J, X ^ q/v ^ q/{PJ) ^ 1, and 
q:^ 1. The free energy per site is then 



f^-T\ogq 



2T f q 



(33) 



As in the previous sub-section, the leading logg' term 
represents the infinite T limit. Equation (33) affirms the 



implication of Eq. (31) regarding the corresponding be- 
Specifically, increasing the tem- 
emulates increasing the number 



havior of large q or T. 
perature (decreasing /3 
of communities q for a ferromagnetic Potts model. 




FIG. 22: (Color online) A sample depiction of a random graph 
with N nodes. In Sec. |VII D| we analyze such a system by 
randomly removing edges from a clique configuration of N 
nodes under the assumption that we maintain a connected 
graph. We show that a ferromagnetic Potts model on a ran- 
dom graph behaves the same in the large q and large T limits. 



( |29| ) gives the exact expression of the Tutte polynomial 
KjXG] x^y = 0) for a clique at = 0. If we cut one edge 
from the complete graph K^^ we obtain the recursion 
formula 



Aat — Ct^v + ^ N-1 ' 



K 



N 



N 



K 



N-1' 



(34) 



where we applied lemma [T] to obtain Eq. ( [34| ). From 
henceforth, we assume the application of lemma [l] We 
are interested in the graph with missing edges, so we solve 



Eq. (34) for G 



.[1] 



D. Free energy of a random graph in a large ^ or a 
large T expansion 

We apply the Tutte polynomial method of Appendix 



E 1 to determine the high T and high q partition function 
for a random graph. For calculation purposes, we begin 
with a complete graph of size N . Then we randomly 
remove edges to construct a random graph such that any 
two nodes are connected by and edge with a probability 
p. The derivation repeatedly applies lemma [l] stated in 
Appendix |E 1[ 

We denote the Tutte polynomial of a complete graph 
(clique) of size / as t{G) for a clique with d dupli- 
cated edges (multiply defined edges between two nodes) 
or loops (self-edges) is defined as k[^\ For economy of 

notation, we also define g|^^ as the Tutte polynomial of 
a graph with m missing edges (i.e., not a clique). Note 



that Kf^^ = G^i^ = Ki. For the following derivation, 
we work under the assumption that when we delete or 
contract any edge, the random graph remains connected. 

Under the high temperature T or high number of clus- 
ters q approximations, y <^ x and y 0. Equation 



.[0] 



^N 



Kn — Kn-1' 



(35) 



Note that the reduced graph is represented as a summa- 
tion over complete graphs. 

Now we apply the Tutte recursion formula to both sides 
of Eq. (I35). 



r^[2] , ^[1] _ ^[1 

^N ^ ^N-1 — ^N 



Kn- 



d}}-KN-i' (36) 



We can choose the deleted and contracted edges in the 
corresponding terms to be identical because the resulting 
Tutte polynomial is in general independent of the oper- 
ation order. After collecting terms and substituting the 
previous G^^ result, we solve for G^^ to obtain 



G 



Kn - 2K 



N-1 



K 



N-2i 



(37) 



for this particular random graph. Again, the right-hand- 
side of Eq. (37) is a summation over complete graphs. 



\k] 

This recursive relation for G-^ continues until we obtain 



G 



K 



N- 



(38) 
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We insert this into Eq. (29) with the pre-factor qv to 



generate the partition function at high T 



N-1 



N\T{x 



■N- 



1) 



Fix) 

(39) 

We substitute x = {q -\- v)/v ^ q/v when v <^ q (high T 
or high q approximations) and again utihze v = e^^ — 1 ^ 
/3 J in the high T approximation to obtain the free energy 
per site 



/ 



-T\ogq 



N 



N 



log 



i=0 



•TV- 



(*) 



(40) 



Note that the first two terms become \og{q) / N \og{l3 J) 
as A/" ^ oo. From Eq. (40), we obtain the same conclu- 



sion for this random graph as for the previously analyzed 
clique systems. While Sees. [VITAI [yiTBl and |VIIC] re- 
sult in free energies with different functional forms, in 
each case, q and T have the same functional form in the 
arguments of the functions in the high T limit. 



E. Free energy of an arbitrary graph G in the large 
T expansion 

We can construct the explicit high T expansion for an 
arbitrary (unweighted) graph G by means of the Tutte 
polynomial method [63^ . Factoring out q^ and substitut- 
ing |V| = X = q/v -\- 1^ and y = v -\- 1 in Eq. (24), we 



write a trivially modified form of the partition function 



q 



N-kiG) 



t 



V V 



l,v + l 



(41) 



At this point, the equation is completely general, but the 
corresponding behavior for temperature T and number of 
clusters q is almost apparent in the reciprocal relationship 
of q and v. 

Again, x ~ q/v in either the large q or large T limits. 
In a high T approximation, v ~ pj = T/J and y ^ or 
1 (?/ = is a common approximation since x ^ y in the 
same limit). 



N-k{G) 



t(G; 



J ' 



(42) 



where y^, = or 1. The free energy per site is then 



f^-Tlogq- 



N - k{G) 
N 



Tlog 



J_ 



N 



log 



9^ 
J ' 



(43) 

The leading logq term appears in our previous calcula- 
tions. Again, it represents the infinite T limit for an ar- 
bitrary system which is approximately constant in large 
systems. 



From the perspective of increasing g, the similarity to 
the large T behavior is more apparent if we fix the tem- 
perature T = T' and define an effective interaction con- 
stant J a = e'^l^ 



1. We then rewrite Eq. (43) as 



N - k{G) 
TV 



Tlog 



Jn 



N 



log 



Jo 



e'^l^ is a constant. 
< TV, the first two 



(44) 

When TV 
terms become 



/«-riogg- 

where y^ = 
oo and k{G) 

T\og{q)/N\og{pj). Comparing Eqs. (|43]) and shows 
the close correspondence between increasing ^ (at fixed 
T') and increasing T. Jq grows exponentially faster than 
q with decreasing T\ so a finite (perhaps small) stable 
or solvable region is likely except in the presence of high 
noise. 



F. Annealed versus quenched averages 

The above proofs apply to quenched averages because 
the binary distribution is constant with respect to the 
distribution integration. That is, using Eq. (44), we as- 
sume a probability distribution P {{Jij}) and integrate 
over it to obtain the quenched average free energy per 
site 



-^-^(^)iogm 

N \qT J 



1 

'n 



log 



J ' 



(45) 



but the integrand (/o) is a constant because J is inde- 
pendent of {Jij}, so the integral trivially simplifies to 

/ [{Jij }] = fol DJij n P ({Jii }) • (46) 

where the integral is unity. In a more general model 
with a defined {Jij} probability distribution, the leading 
order log q contribution would remain unchanged, but we 
would obtain correction terms from the integration over 
the quenched interaction distribution {Jij}. 



G. Free energy of non-interacting cliques for an 
arbitrary weighted Potts model under a large T 
expansion 

We can represent an arbitrary weighted Potts model 
with ferromagnetic and antiferromagnetic interactions. 
That is, we can generally write 

H{{a}) = -IY1 t^^^-^*^- - - ^^^■)] ^(^^'^j)- (47) 
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where a^j and bij are arbitrary "attractive" and "repul- 
sive" edge weights. This summarization includes mod- 
ularity [47], a Potts model incorporating a "configura- 
tion null model" (CMPM) comparison [6 (the most com- 
mon variation in [6 is effectively generalizes modular- 
ity), CMPM allowing antiferromagnetic relations [64] . 
"label propagation" [TTl [51] , an Erdos-Renyi Potts model 
[6j 65^ , a "constant Potts model" [46] , the weighted form 
of the APM [HI [31], or a "variable topology Potts model" 
suggested in [8]. 

Note that the repulsive weights bij are important in 
that they provide a "penalty function" which enables 
a well-defined ground state for the Hamiltonian for an 
arbitrary graph. That is, the ground state of a purely 
ferromagnetic Potts model in an arbitrary graph is triv- 
ially a fully collapsed system (perhaps with disjoint sub- 
graphs). Several of the above models incorporate a 
weighting factor 7 of some type on the penalty term 
which allows the model to span different scales of the 
network in qualitatively similar ways. 

We denote a the partition function of a graph G* with 
/ nodes and weighted edges {e} by Z(G*; v) = JCi. We 
assume that Je <^T for all edges e, and all pairs of nodes 
in G* are connected by a weighted edge Jg (either ferro- 
magnetic or antiferromagnetic). From Appendix E2, a 



recurrence relation for the multivariate Tutte polynomial 
of a general weighted clique is 



^-1 



(48) 



k=l 



model of Eq. (47), and E is the total energy of the graph. 
Equations (50) and (51 ) both imply that large q emulates 
large T for an arbitrary Potts model on a weighted graph 
G. That is, if a community detection quality function can 
be expressed in terms of the general Potts model in Eq. 
(47), then large q and large T are essentially equivalent. 



H. Free energy of non-interacting cliques for an 
arbitrary weighted Potts model under a large q 
expansion 

The multivariate Tutte polynomial [66^ (see also Ap- 
pendix E2 and Ref. [56]) appears in a subgraph expan- 



sion over the subset of edges ^ C f in a graph G = (F, E) 
with a set of V vertices and S edges 



Z{G;q,v) = q 



N 




\£\ 

/'=i 
(52) 

k{A) is the number of connected components of Ga = 
(F, A) and Ve = exp(/3Je) — 1. For our purposes, Eq. ( [52] ) 
serves as an alternate representation of Zq to facilitate 
the calculation of the large q expansion. 

For large when q^ ^ \^e\^i the last term may ne- 
glect, and for a system of non-interacting cliques of size 
li with i = 1,2,.. the leading order terms in large q 
are 



The partition function for JCi at high T is 



ZiG;q,v) 




(53) 



(49) 



j=2 \ k=l 



Now, we generate a graph consisting of a set of q non- 
interacting cliques of size li where i = 1, 2, . . . , g'. 



i=lj=2 \ fe=l 



(50) 



where we used Ve ^ PJe at high T for general edge 
weights Je (even if Jg < as long as Je <^T). 
The free energy is 



i=l j=2 k=l ^ 



= -Tlogq 



E 
qN 



(51) 



where we invoked log(l + x) ~ x for x <C 1 there. Ei is 
the energy of cluster i according to the weighted Potts 



The approximation is identical to Eq. (49) at high T. 



Ref. [56] calculates an explicit crossover temperature in- 
cluding the last subgraph A = S that competes with the 
large q terms as T ^ 0. The free energy corresponding 
to Eq. (53) becomes 



-T log q 



T 

N 



q h i-1 

EEE 

i=l j=2 k=l 



Vk^ 

q 



(54) 



where we applied the small x approximation log(l - 



In order to illustrate the correspondence in large q and 
T, we fix T = T', define Je^^ = exp(/3' Jg) — 1, and rewrite 
the free energy per site 



/ « -riogq 



N 



= 1 j=2 k=l 



(55) 



Large q in Eq. (52) emulates large T in E q. ([50]) . As 
with the unweighted case in Eq. (44) in Sec. |vIIe[ ji^^ 
is exponentially weighted in (3^ = 1/T\ so a non-zero 
(perhaps small) region of stability is essentially ensured 
except in the presence of high noise [5^. We can ad- 
ditionally determine a rigorous bound using methods in 
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[561 1671 [6H] 



Jo 



log 



p(g-l) 



(56) 



where Jo = ^ Jjo [1 + sgn( Jjo)] is a generous upper 
bound summing only positive energy contributions and 
p is the probability for finding a given spin ctq in a spe- 
cific spin state a. This result further agrees with our 
conclusions. Note that as p the system is com- 

pletely disordered, so Tx ^ oo. As p ^ 1, the system is 
perfectly ordered, so T ^ 0. 



is circumvented by the often standard use of a simulated 
annealing algorithm, but the "glassy" (high noise) re- 
gion remains a challenge for any community detection 
algorithm. 
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VIII. CONCLUSIONS 

We systematically examined the phase transitions for 
the community detection problem via a "noise test" 
across a range of parameters. The noise test consists of 
a structured graph with a strongly-defined ground state. 
We add increasing numbers of extraneous intercommu- 
nity edges (noise) and test the performance of a stochas- 
tic community detection algorithm in solving for the well- 
defined ground state. Specifically, we studied two types 
(sequences) of systems. In the first such sequence of sys- 
tems in Fig. [sj we fixed the ratio a = q/N of the number 
of communities q to the number of nodes N. We fixed q 
at different values and varied N in the second sequence 
of systems in Fig. [9) In Fig. [13) we explored the largest 
tested systems with N = 2048 nodes in more detail where 
we depicted additional measures to illustrate the transi- 
tions. All of these systems showed regions with distinct 
phase transitions in the large N limit. Deviations oc- 
curred most often in smaller systems indicating a definite 
finite-size effect. 

The spin-glass-type phase transitions in our noise test 
occurred between solvable and unsolvable regions of the 
community detection problem. A hard, but solvable, re- 
gion lies at the transition itself where it is difficult, in gen- 
eral, for any community detection algorithm to obtain the 
correct solution. We analyzed a system of non-interacting 
cliques and illustrated that in the large q limit, the system 
experiences a thermal disorder in the thermodynamic 
limit for any non-zero temperature. When in contact 
with a heat bath, the asymptotic behavior of the tem- 
peratures beyond which the system is permanently dis- 
ordered varies slowly with the number of communities q, 
specifically, Tx ^ 0[l/logq]. This implies that problems 
of practical size maintain a definite region of solvability. 
Given the connection between Jones polynomials of knot 
theory and Tutte polynomials for the Potts model, our 
results imply similar transitions in large random knots 
(see Appendix [G|). 

We further studied the free energy of arbitrary graphs 
arriving at the same conclusion. Increasing number of 
communities q emulates increasing T in arbitrary graphs 
for a general Potts model. The effective interaction 
strength for increasing q scales such that this disorder 



Appendix A: Definitions: Trials and Replicas 

We review the notion of trials and replicas on which 
our algorithms are based. Both pertain to the use of mul- 
tiple identical copies of the same system which differ from 
one another by a permutation of the site indices. Thus, 
whenever the time evolution may depend on sequentially 
ordered searches for energy lowering moves (as it will in 
our greedy algorithm), these copies may generally reach 
different final candidate solutions. By the use of an en- 
semble of such identical copies (see, e.g.. Fig. |2|, we can 
attain accurate result as well as determine information 
theory correlations between candidate solutions and in- 
fer from these a detailed picture of the system. 

In the definitions of "trials" and "replicas" given below, 
we build on the existence of a given algorithm (any algo- 
rithm) that may minimize a given energy or cost function. 
In our particular case, we minimize the Hamiltonian of 
Eq.([l] 

• Trials. We use trials alone in our bare community 
detection algorithm. We run the algorithm on the same 
problem t independent times. This may generally lead 
to different contending states that minimize Eq.g. Out 
of these t trials, we will pick the lowest energy state and 
use that state as the solution. 

• Replicas. We use both trials and replicas in our multi- 
scale community detection algorithm. Each sequence of 
the above described t trials is termed a replica. When 
using "replicas" in the current context, we run the afore- 
mentioned t trials (and pick the lowest solution) r inde- 
pendent times. By examining information theory corre- 
lations between the r replicas we infer which features of 
the contending solutions are well agreed on (and thus are 
likely to be correct) and on which features there is a large 
variance between the disparate contending solutions that 
may generally mark important physical boundaries. We 
will compute the information theory correlations within 
the ensemble of r replicas. Specifically, information the- 
ory extrema as a function of the scale parameters, gener- 
ally correspond to more pertinent solutions that are lo- 
cally stable to a continuous change of scale. It is in this 
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way that we will detect the important physical scales in 
the system (Fig. [2|. 



Appendix B: Information theory and complexity 
measures 

We use information theory measures to calculate cor- 
relations between community detection solutions and ex- 
pected partitions in the noise test problem. To begin, N 
nodes of partition A are partitioned into qa communi- 
ties of size {ria} where 1 < a < qa- The ratio Ua/N is 
the probability that a randomly selected node is found in 
community a. The Shannon entropy is 



Ha 



N 



(Bl) 



The mutual information /(A, B) between partitions A 
and B is 



/(AS) = -log,— 

a—1 6=1 



(B2) 



where Uab is the number of nodes of community a in par- 
tition A that are also found in community b of partition 
B. The normalized mutual information /Ar(^, B) is then 



In{A,B) 



2nA,B) 
Ha + Hb 



(B3) 



with the obvious range of < In{A^B) < 1. High In 
values indicate better agreement between compared par- 
titions. 



Appendix C: Computational susceptibility 

The complexity T^{e) of the energy landscape is related 
to the number of states Af{E) ~ exp[A/'I](e)] [17 with 
energy E and energy density e = E/N . In the current 
analysis, we detect the onset of the high complexity with 
no prior assumptions or approximations by computing a 
"computational susceptibility" [8 defined as 



Xn = In{s = n) - In{s = 4). 



(CI) 



That is, X measures the increase in the normalized mu- 
tual information In as the number of trials (number 
of independently solved starting points in the energy 
landscape) s = n is increased. Physically, we evaluate 
how many different optimization trials are necessary to 
achieve a desired accuracy threshold. 

X evaluates the expected response of the system to ad- 
ditional optimization effort. That is, a higher x indicates 
that additional optimization effort will likely result in a 
better solution. A low value of x indicates that there will 
be less improvement from the additional effort whether 



due to a trivially solvable system, a complex energy land- 
scape with numerous local minima that trap the solver 
(at low to moderate temperatures), or thermal-oriented 
effects of randomly wandering the energy landscape. 



Appendix D: Heat Bath Algorithm 

We extend the greedy algorithm in [8[ |3T] to non-zero 
temperatures by applying a heat bath algorithm. After, 
we connect the system to a large thermal reservoir at a 
constant temperature T, the probability for a particular 
node to move from community a to 6 is set by a thermal 
distribution [6 , 



Pa 



(Dl) 



AEa^b is the energy change that results if the node is 
moved to the new community 6, and the index d runs over 
all connected clusters including its current community or 
a new empty community. The steps of our heat bath 
algorithm are as follows: 

(1) Initialize the system. Initialize the network into 
a "symmetric" state by assigning each node as the lone 
member of its own community (i.e., Qq = N). 

(2) Find the best cluster for node i. Select a node and 
determine to which clusters it is connected (including 
its current community and an empty cluster). Calcu- 
late the energy change AEa^b required to move to each 
connected cluster b. Calculate and sum all Boltzmann 
weights. Generate a random number between and 1 
and determine into which cluster the node is placed. 

(3) Iterate over all nodes. Repeat step 2 in sequence 
for each node. 

(4) Merge clusters. Allow for the merger of commu- 
nity pairs based on the same Boltzmann- weighted merge 
probabilities. 

(5) Repeat the above two steps. Repeat steps 2 through 
4 until the maximum number of iterations is reached. 

(6) Repeat all the above steps for s trials. Repeat steps 
1-5 for s trials and select the lowest energy trial as the 
best solution. Each trial randomly permutes the order of 
nodes in the initial state. 

This HBA is similar to our greedy algorithm except 
that we use a random process to select the node moves 
in steps (2) and (4). The results obtained at low temper- 
ature by our HBA are very close to the results obtained 
by the zero temperature greedy algorithms. Note that 
there is no cooling scheme as occurs in SA, so step 5 
ends at a maximum number of iterations as opposed to 
a unchanged best partition that is achieved as T ^ in 
SA. 

In the easy phase, different starting trajectories, each 
beginning in the symmetric initial state, but they often 
lead to the same solution. In the hard phase, changing 
the random seed may significantly alter the final result of 
an individual trial because the solver becomes trapped in 
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different local minima. Thus we apply additional trials in 
order to sample different regions of the energy landscape 
and arrive at better solutions. In the unsolvable phase, 
increasing the number of trials s does not substantially 
change the quality of the solutions unless one happens to 
sample the energy landscape in the immediate vicinity 
near the optimal partition, but the probability of doing 
so is small with a finite number of trials s. 



Appendix E: Tutte polynomials 

We give a very brief introduction to Tutte polynomials 
consisting of the essential facts necessary for the deriva- 
tions presented in this paper. The notation used here is 
mostly standard, but the notation elsewhere in the text 
deviates from standard notation in order to facilitate the 
partition function derivation in Sec. VII D[ For an undi- 
rected graph G, we denote the deletion (removal) of an 
edge e by G' and a contraction of the edge by G" where a 
contraction consists of removing the edge e and merging 
the corresponding vertices. 



where q is the number of clusters or states, v = ex.p{pj) — 
1, G denotes the graph, k{G) is the number of connected 
components in G, \V\ is the number of vertices, x = 
{q -\- v)/v and y = v -\- 1. 

In Sec. |VII D[ we use the following lemma to derive 
high temperature T approximation for a constructed ran- 
dom graph. We denote Ki as the Tutte polynomial for a 
complete graph, and K^^^ denotes that the graph has d 
duplicated (possibly redundant) edges. 

Lemma 1. For a clique k[^^ of size I with d duplicate 
edges between any pair of nodes, the Tutte polynomial at 
y = is Ki. 

Proof. Let G be a complete graph with / vertices and 
d = 1 redundant edge. If we delete and contract the 



duplicate edge, the Tutte polynomial t{G) = 



IS 



(1) 



{i-i) 
i-i 



The contracted vertex in the second term contains r 
loop. We cut the loop and have 



1. Unweighted graph G 

If G has no edges, the Tutte polynomial is t(G; x, y) = 
1. If G is a disjoint graph of partitions, then A and 
B t{G; X, y) = t{A; x, y) t{B; x, y). When an edge e in an 
unweighted graph G is ^^cut^^^ the recurrence relations are 

m- 

• For a general edge, t{G;x^y) = t{G'^]x.,y) + 
t{G'l]x^y) which is the sum of two graphs where 
e is deleted and contracted. 

• If edge e is an isthmus between two otherwise 
disconnected regions of G, then t(G; x, y) = 
xt{G'l]x^y) where the edge e is contracted. 

• If edge e is a loop (a vertex self-edge), then 
t{G] X, y) = yt{G'^] x, y) where the edge e is deleted. 

The resulting Tutte polynomial is a function of two vari- 
ables (x,y), and it is independent of the construction 
order. Different graphs G and H may be described by 
the same function t{G;x^y) = t{H;x^y). A sample cal- 
culation is performed Appendix [F] for a circle of complete 
sub-graphs (cliques) as shown in Fig. |23]^b). 

Tutte polynomials are related to the partition function 
of a ferromagnetic ( J > 0) or antiferromagnetic ( J < 0) 
Potts model given by 



JS{ai, (jj) 



(El) 



for any connected pair of nodes i and j with an inter- 
action strength J. The corresponding partition function 
is 



(E2) 



K, 



(1) 
I 

(1) 



Ki 
Ki 



■ y^i-i 



(E3) 



where we used y = in the second line. 

Now, assume that we can reduce K^"^^ = Ki. Let G 
be a complete graph with / vertices and d + 1 duplicate 
edges. If we cut one duplicate edge, the resulting Tutte 
polynomial t{G) 



kI'+''> is 



(d+i) 



The contracted vertex in the second term contains r > 1 
loops. We cut each loop in sequence and obtain 



(d+i) 
I 

(d+i) 



id) 
I 

id) 



-1) 



(E4) 



Sinc e k[^^ = Ki , we also equate K^"^^^^ = Ki by Eq. 
(|E4|). Equation (E3) shows that the relation holds for 

Ki 

□ 



d = 1; therefore, by mathematical induction Kj^ 
holds true for any integer d>l. 



id) 



2. Weighted graph G 

An excellent summary of multivariate Tutte polyno- 
mials (MVTP) is found in Ref. [66 . The MVTP allows 
for arbitrary weights v = [ve] for the edges {e} of G. If 
G has no edges, the MVTP is Z{G;q,v) = q. For an 
undirected graph G, the weighted Potts Hamiltonian is 



(E5) 
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B 




(x+l) 



x(x+ 1) 



(a) 



B 

(b) 




+ y 



A 



c 



+ {x+y+\) 



q-\ 



A 



FIG. 23: (Color online) In panel (a), we depict a chain Bq of q cliques (complete sub-graphs of ma ximally connected clusters) 
of size / connected by single edges. The corresponding circle of q cliques Cq is depicted in Fig. 21 In panel (b), we show the 
derivation of the Tutte polynomial in Eq. ( |F2j for size / = 3 cliques. We iteratively break edges and merge nodes according to 



the Tutte polynomial recurrence relation [63j in Appendix E 1 until we arrive at configurations that are reduced clique circle 
components. For presentation purposes, gray edges are cut in the next line of the derivation. The dashed gray line at the 
bottom of each sub-graph represents the remainder of the clique circle which is not touched or affected by the operations on 
the displayed subgraph. 



When an edge e in G is "ci/t," the recurrence relation is 

Z(G; v) = Z(G^ q, v) + VeZ{G"', q, v) (E6) 

where Jg corresponds to the edge weight between two 
nodes i and j and Ve = exp pj^ — 1. 

As with the unweighted case, if G is a disjoint 
graph of partitions A and then Z{G; x^y) = 
Z{A]q^v) Z{B;q^v). If partitions A and B are 
joined at a single vertex, then then Z{G; x, y) = 
Z{A;q,v) Z{B;q,v)/q. Unlike Eq. (|E2| for unweighted 
graphs, Eq. ( |E6| ) holds for loops or bridges, but for con- 
creteness, cutting an isthmus e yields 

Z{G;q,v) = {1 + v,/ q) Z{G',;x,y) (E7) 
Z{G;q,w) = {q + v,) Z{G:;x,y) (E8) 

where e is deleted or contracted, respectively. If e is a 
loop, then 



Z(G;q,v) = (l + «e)Z(G;;x,y). 



(E9) 



Note that the MVTP is the partition function. That is, 
there are no prefactors of q or v^. Finally, if two parallel 
edges connect the same pair of nodes i and j with weights 



Ji and J2, then Zq is unchanged if we replace the parallel 
edges by a single edge with a weight = Ji + J2 (this 
negates the need for lemma [l] above). 



Appendix F: Derivation of the Tutte polynimial for 
a circle of cliques 



As depicted in Fig. 21 , we define Cq as a circle of q 
cliques where we focus those of size / = 3 for the current 
derivation. The Tutte polynomial for a triangle is A = 
(x^ -\- X -\- y). For convenience, we also define. A' = (A + 
X + 1) = [(x + 1)^ + y] and y' = {x ^ y ^ 1). 

We define Bq to be the Tutte polynomial for a clique 
chain as depicted in Fig. [23ja). In this case, it is trivial 
to construct Ba 



(Fl) 



With Eq. (|F1|, we construct a recurrence relation for the 
clique circle configurations as shown in Fig. [23Fb) 



Cq = Bq^x{x^l) Bq_i + (x + 7/ + l) G,_l. (F2) 
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We make a final high T approximation 



) ^ (X + 1) 



g+1 ^2g-3 



(F6) 



using (x^^-i - 1) x^^-i and (1 - x)'^ (I ^ x) jx^. 



FIG. 24: (Color online) Panel (a) depicts the trefoil knot, and 
panel (b) shows the corresponding graph G constructed from 
the distinct knot regions and crossings [69 . That is, nodes 
correspond to "checkerboard-shaded" regions (shade the out- 
side lobes of the trefoil knot leaving the interior region un- 
shaded), and edges correspond to knot crossings. Jones poly- 
nomials Vj{ x) in knot theory are related to Tutte Polynomi- 
als, and Eq. (Gl) represents the trefoil knot corresponding to 



the triangle subgraph in panel (b). 



From this relation, we can sum the series exactly. 
= 5, + A%_i+x(x + l)(x + y + l)5g_2 

+ (x + y+l)'C,_2 
g-4 

= 5, + A'^(x + 7/+l)'5,_,_i 

+ A (x + ^ + 1)^-' B^^ix^y^ \f-^ C2. (F3) 

Note that the last Bj term uses A not A^ Also, it can be 
shown that C2 = (x + 1)^ (x^ + A) + (x + 1) A. Sub- 
stituting these values into the equation, we arrive at 



Appendix G: Random knot "transitions" 



A general 3D knot may be represented as a 4- 
valent planar graph [69 [i.e., corresponding to a two- 
dimensional (2D) square lattice connectivity allowing 
self- loops]. This relation connects the Tutte polynomial 
to the Jones polynomial in knot theory. Conversely, all 
connected, signed planar graphs have a corresponding 
link diagram representation (2D knot projection). Al- 
ternating over-under crossings result in unsigned planar 
graphs [69^ (e.g., the trefoil knot in Fig. |24|. Ref. [70] 
provides an introduction to the mathematics and physics 
of knot theory. The Jones polynomial of a given knot is 
intimately related to quantum field theories [71 , via its 
connection to [an SU(2) type] Wilson loop associated the 
same knot. 



As a concrete example. Fig. 24 'a) depicts a simple tre- 



foil knot which is related to the triangle clique depicted 
in Fig.[24];b) [69 . The Tutte polynomial of Fig. [24{b) is 
Ks{G]x,y) = + 
polynomial 



y. Then we generate the Jones 



g-4 



= x^-^a^ + a'^^'V-^-^a^-^-i + x^'^-'a^ 



i=0 

-f xy"^{x'^ + X + 1) + yy'^'^A'. 



(F4) 



In the high temperature T limit, y <C so we approxi- 
mate y — 0, and the equation simplifies to 



— X {x -\- ly 



^ H h x^ + X + 1 



1-x 



(F5) 



Vj{x) = x^ 



1 

X 



(Gl) 



where we used xy = 1 because the trefoil knot has al- 
ternating crossings [72 . While the trefoil knot is clearly 
not random, we conjecture that the transitions detected 
in random graphs with embedded ground states in the 
current work can have similar transition repercussions in 
random knots. 
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